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Progress in the past three months has occurred in two areas: 
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I. Reconstruction of ancestral proteins: 

Reconstructions of ancient forms of life using "maximum parsimony" and "maximuim likelihood 
methods offer a perspective on how contemporary life emerged, how its evolution is influenced by 
geological and cosmogenic events, and how new forms of life replace old forms of life by mass 
extinctions. Further, they allow one to approach the origin of life following a "present day backwards" 
strategy that is an alternative to prebiological experiments. One focus of this project is a "staged" 
reconstruction of ancient biomolecular structures, beginning with reconstructions that go only modestly 
back in time and continuing with recostructions that are more ancient. This strategy reflects the fact that 
more recent events are easier to reconstruct than more ancient events, as extinctions, rapid sequence 
evolution associated with functional adaptation, and non-Markovian processes (gene conversion, for 
example) cause the loss of information. Our strategy is to encounter and solve problems in this staging 
as they occur. 

A second focus has been to demonstrate "practical" applications of exobiological research. In 
addition to the obvious pragmatic reasons for doing this (funds from sources other than NASA will be 
needed to carry this line of research to completion, and mission-oriented funding agencies always seem 
to have more money than basic science agencies), it remains a worthwhile test of any new research 
strategy to challenge it to solve real-world problems. This prevents the research programme from 
degenerating into a deep, scholarly study covering a narrow scope with little breadth in interest. 

A. Ribonuclease (RNase). 

RNase has been used as a system to examine very recent reconstructions, primarily because it is a 
protein with multiple paralogs (sister genes in the same organism) which can (and might have) talked to 
each other via gene conversion (a non-Markovian process that undermines the evolutionary model that 
underlies these reconstrutions), and which clearly has undergone rapid sequence evolution in some of 
its branches to acquire new biological functions. The protein family has now provided another 
"practical" applications of exobiological research; a reconstruction completed in the laboratory has 
permitted the correlation of in vitro behavior of the protein to physiological function. The approach is 
quite general, meaning that it should be possible to use analogous evolutionary reconstructions to 
identify physiologically relevant traits in many proteins. The exobiological research therefore addresses 
one of the classic, unsolved questions in experimental biochemistry: "Is the behavior that I am studying 
in vitro have any relevance physiologically?" A paper will be prepared when the final controls are 
completed, which we expect before the end of the contract period. 

B. SH2 domains. 

During the evolution of metazoa, tyrosine phosphorylation evolved as a crucial component of signal 
transduction, allowing cells within an organism to communicate with each other. Src homology 2 
(SH2) domains are involved in recognizing phosphorylated tyrosine residues, allowing the signal 
generated from a binding event on a cell surface to be transduced to the nucleus where specific changes 
in gene expression result. These domains are small -100 amino acid peptide motifs that have 
undergone major radiative divergence in metazoa. 

While one yeast protein, SPT6, shows weak homology to metazoan SH2 domains [1], multiple gene 
duplication events postdating the yeast-metazoan divergence but predating the nematode-chordate 
divergence have resulted in a wide array of SH2 domains bearing a wide range of binding specificities 
and signaling capabilities [2,3]. Thus, it appears as if the SH2 domains originated their function at the 
time of the Cambrian explosion, one of the most important (for humans) events in the biosphere. The 
binding specificity of an SH2 domain for its substrate is dependent upon the three-five residues C- 



terminal to the phosphotyrosine. We have reconstructed the evolutionary history of the SH2 domain 
and its signaling specificity to provide insight into the evolution of metazoan metabolism. 

All domains that had been identified as SH2 domains in Swiss Prot Protein Database (October 3 1 , 
1997 release) were downloaded and submitted to Darwin (a bioinformatics tool developed in these 
laboratories) for a multiple alignment. Many of the SH2 domains showed weak similarity to human src 
that was not significant and were eliminated from the alignment. The multiple alignment obtained had 
many misplaced gaps and was realigned manually. These sequences were then placed in MacClade and 
a phylogenetic tree was generated using the principle of maximum parsimony, where a biological tree 
was utilized so long as it adhered to the maximum parsimony principle. Sequences resulting from 
Cambrian era gene duplications were branched solely based upon the principle of maximum parsimony. 

The SH2 domain offers many of the classic problems associated with intermediate antiquity 
reconstructions. Many of the sequences from these gene duplication events had diverged significantly 
from each other, making reconstruction of the universal ancestral SH2 domains essentially impossible 
given available sequences. Therefore, in this case, the expressed to silent ratio analysis was discarded 
as a measure of adaptive evolution and the study focused in on one family of the tree, that containing 
the Grb2 domain. 

Grb2 is an adapter protein containing both an SH2 domain and two SH3 domains. While 
homologues from D. melanogaster and C. elegans are functionally interchangeable at an organism 
viability level [4], the human, D. melanogaster, and C. elegans proteins show differences in binding 
specificity [2,3], This binding specificity evolution was taken as a lead to study the evolution of 
binding specificity in this one subfamily dating back to the Cambrian explosion. This subfamily 
contained several vertebrate sequences- human, mouse, rat, Xenopus, chicken, and human GRAP 
(resulting from a gene duplication event that appears from its most parsimonious placement on the tree 
to be chordate-specific [5,6]), as well as the D. melanogaster and C. elegans sequences. Human nek 
was the most closely related outgroup in the reconstruction of the last common ancestor of grb2 
homologues. The reconstruction from this tree contained several ambiguous residues. Attempts were 
made to resolve these using either DNA parsimony or one of several maximum likelihood techniques 
before deciding to clone the gene from additional invertebrate species to resolve the ambiguities. 

Short primers were developed from the C. elegans, D. melanogaster, and chordate SH3 sequences 
independently and were used to clone the gene from various cDNA samples. The chordate primers 
were initially used to pull a band of the expected size (-350 base pairs) from Ursus maritimus cDNA 
(around the lab for another project) as a positive control. Subsequently, bands of the expected size 
were obtained from Gekensia demissa cDNA using the D. melanogaster primers and Lytechinus 
variegatis cDNA using the chordate primers. These bands will be sequenced once the gene has been 
cloned from additional invertebrate species. This should allow a faithful reconstruction of the Grb2 
subfamily of SH2 domains, synthesis of the ancient protein in the laboratory, and analysis of its binding 
specificity to provide insight into the evolution of metazoan signal transduction and its role in the 
evolution of multicellularity. 
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II. Nucleic acid sequences 

Earlier work on nucleic acid analogs in these laboratories has discovered a resonance in the 
exobiological community. There are two reasons for this. 

First, planetary exploration has almost arrived to the point where it is possible to retrieve samples of 
Martian or European (as in the moon) landscape in a form suitable for examination for chemical 
remnants of life or (if lucky), the chemistry of actual non-terrestrial life. This possibility has focused 
attention on the deficiencies in our knowledge of the chemical features of life that are likely to be 
universal regardless of genesis. The expansion of the genetic alphabet in these laboratories has shown 
that alternative chemical structures work, even within the context of DNA chemistry. Work with non- 
ionic analogs of nucleic acids has suggested specific limitations on this, however. 

Second, an explosion of papers and manuscripts has shown that those attempting to create catalytic 
molecules from standard nucleic acids (seeking evidence for the RNA world) are constraining their 
search in a non-historical (and probably unproductive) way. In our laboratory (manuscripts enclosed), 
and at NexStar [7], it has now been shown that the catalytic potential of nucleic acids can be 
significantly expanded by adding functionality to the standard bases, while Miller and his coworkers 
have shown that functionalized nucleosides might have been available prebiotically [8]. These results 
show that it will not be long (months or years, but not a decade) before self-replicating, evolving 
systems based on an expanded genetic alphabet will be available in the laboratory. These will begin an 
entirely new phase of research into life, one that builds cells from the bottom up. 

Two manuscripts are enclosed that address these two points. The first, prepared for a book on the 
RNA world being edited by Tom Cech, concerns universal chemical properties of genetic material 
capable of searching "mutation space" independent of concern of loss of properties essential for 
replication ("COSMIC-LOPER" behavior). The second reports the first in vitro selection experiment 
with a positively charged nucleotide analog. 

7. Tarasow, T.M.; Tarasow, S.L.; Eaton, B.E. Nature 1997 , 389, 54-57 

8. Robertson, M.P.; Miller, S.L. Science 1995 , 268, 702-705. 
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Abstract. 5-(3"-Aminopropynyl)-2'-deoxyuridine (J), a modified nucleoside with a side chain carrying 
a cationic functional group, was incorporated into an oligonucleotide library, which was successfully 
amplified using the Vent DNA polymerase in a polymerase chain reaction (PCR). When coupled to an 
in vitro selection procedure, the PCR amplification generated receptors (aptamers) that bound ATP, 
ADP, and AMP. This is the first example where a polymerase chain reaction (PCR) has been applied to 
an oligonucleotide library containing modified nucleotides carrying positively charged functional 
groups. The outcome of the in vitro selection incorporating the cationic nucleotide was compared with 
that of a standard selection using only natural nucleotides, a procedure documented by Huizenga and 
Szostak ( Biochemistry 1995, 34, 656-665). The amplification of the functionalized library generated a 
motif containing J differing from the motif well known to arise in a standard selection experiment using 
only natural nucleotides. The motif containing J had an affinity for ATP ca. 2 orders of magnitude 
greater than that obtained from the standard selection, but no detectable affinity for ATP if thymidine 
replaces J. 

Key words. In vitro selection, DNA, aptamer, polymerase chain reaction, Selex 
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In vitro selection is a combinatorial method that generates new receptors, ligands, and catalysts 
from nucleic acid libraries containing as many as 10 15 different molecules. 1 It exploits (a) a "selection" 
procedure to separate RNA or DNA molecules with specific binding or catalytic properties from those 
lacking these properties, (b) the polymerase chain reaction (PCR), which amplifies single selected 
RNA or DNA molecules to give a large number of their descendants, and (c) mutation, which allows 
the descendent molecules to "evolve" to improve their binding or catalytic activities. Using this 
approach, a variety of ligands, receptors, and catalysts have been generated. 2 

These successes notwithstanding, quantitative analysis of in vitro selection experiments shows 
that the "intrinsic" potential of RNA as both a receptor and a catalyst is poor, especially when 
compared with that displayed by polypeptides. 3 For example, to have a 50% chance of containing a 
single RNA molecule capable of catalyzing a template-directed ligation reaction by a modest (by protein 
standards) factor of 10,000, a library of RNA molecules 220 nucleotides in length must have 2 x 10 13 
random sequences. 4 The limitation of nucleic acids appears to reside in the small number of encodable 
building blocks and little diversity in the functionality that they carry. Most notable in view of the 
polyanionic nature of RNA and DNA, standard nucleic acids have no encoded positive charge at 
neutral pH. This in turn limits the range of the binding and catalysis of products derived by in vitro 
selection. 

While many non-standard 5 and functionalized derivatives of nucleic acids have been prepared, ^ 
and still more occur naturally, 2 it remains a challenge to develop in vitro selection protocols that 
incorporate them. Even modest discrimination by a polymerase against a functionalized nucleotide may 
lead to a selective loss of the functionality during PCR amplification. While polymerases are known to 
incorporate nucleotides carrying neutral functionality,** and two in vitro selection experiments have 
incorporated these, 9 it has proven especially difficult to successfully amplify by PCR oligonucleotides 
containing nucleotide analogs carrying positively charged functionality, the functionality most notably 
missing from standard DNA and RNA. 

To surmount this obstacle, 5-(3"-aminopropynyl)-2'-deoxyuridine 10 (Figure 1), trivially 
designated as "J", was prepared by coupling N-propynyltrifluoroacetamide 1 M2 with 2'-deoxy-5- 
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iodouridine (Sigma) with the aid of a palladium catalyst, and converted to the corresponding 
triphosphate following a procedure of Ludwig and Eckstein. 13 The side chain ammonium ion, with a 
pK a of ca. 9.5, carries a positive charge at physiological pH. In preliminary experiments, JTP was 
examined as a substrate for a range of thermostable polymerases. As has been observed with previous 
experiments with non-standard bases, 14 different polymerases behaved quite differently with respect to 
the unnatural modification. Taq polymerase, for example, incorporates ATP opposite J in the template, 
but stops rather than incorporate JTP opposite A in a template. Tth and Tfl polymerases pause when 
encountering J both in a template and as a triphosphate. Vent polymerase, however, incorporates both 
JTP opposite A and ATP opposite J without pausing. Based on these experiments, Vent polymerase 
was chosen as a tool for incorporating J into a PCR experiment 

A series of experiments were then run to show that Vent was able to amplify oligonucleotides 
containing the positively charged nucleobase. Parallel PCR experiments were run with natural 
deoxynucleoside triphosphate mixes and mixes having TTP replaced by JTP (Figure 2). In both cases, 
amplification was seen, with the JTP supporting PCR amplification only slightly less well than TTP 
with the Vent polymerase. 

To establish the value of the PCR amplification using JTP, an in vitro selection experiment was 
run incorporating it into a library following the procedure of Huizenga and Szostak, 15 who used in 
vitro selection experiments with natural triphosphates to obtain receptors for ATP. A parallel in vitro 
selection was run with JTP replacing TTP in the triphosphate mix. Nine rounds of selection were 
performed by passing successively enriched libraries through a column with immobilized ATP, ADP, 
and AMP, each separated by 10-14 cycles of amplification, were used. *6 Products were cloned and the 
clones sequenced. 

These in vitro selection experiments using the J-containing DNA (J-DNA) library were 
benchmarked against parallel selection experiments using a library containing only standard DNA 
nucleotides. As these parallel standard DNA selection experiments followed the work of Huizenga and 
Szostak, they should generate a known motif that binds adenosine derivatives. 15 Accordingly, of 17 
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sequences isolated from the parallel selection with standard DNA, 1 1 showed a binding motif 
previously isolated by Huizenga and Szostak (motif A). 

In the J-DNA selections, however, only 11 of the 39 sequences contained motif A, and several of 
these sequences originated from the same parent molecule. Another motif was found in 19 of the J- 
DNA sequences. This novel motif was built from a conserved 13-mer and a conserved 3-mer (motif 
B). Motif B was then prepared with unfunctionalized T replacing J. In this form, motif B does not bind 
ATP, suggesting that the cationic functionality carried by J is essential for the binding properties of the 
new motif. 

These results show that a nucleotide derivative J carrying a functionalized side chain bearing a 
full positive charge at physiological pH can be accepted as a component of a polymerase chain reaction. 
The presence of J in defined motifs obtained from multiple clones shows that incorporation was at a 
defined sequence position. 

These results also show that the PCR can be used as a part of an in vitro selection experiment to 
generate functionalized aptamers. It is intriguing to compare the results obtained with the functionalized 
library with those obtained by Huizenga and Szostak 1 ^ and Sassanfar and Szostak 17 from 
unfunctionalized DNA and RNA libraries (respectively). Using the gel filtration method of Sassanfar 
and Szostak, 17 an equilibrium disassociation constant (Kj) of ca. 40 nM was measured for the binding 
of ATP to the functionalized aptamer whose sequence is shown in Figure 3. This value is 
approximately 2 orders of magnitude lower than the 6-8 pM disassociation constants reported for the 
binding of a representative example of the consensus motif A obtained from unfunctionalized in 
standard DNA. 15 - 17 This suggests that introduction of a cationic functionality (an ammonium group 
bearing a positive charge) enhances the "intrinsic" affinity of a receptor obtained from a DNA library 
by ca. two orders of magnitude. While more experiments must be done with more targets, it appears 
that the addition of the cationic functionality provided by an ammonium ion does indeed improve the 
quality (as measured in terms of affinity) of the aptamer product. 
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Figure 1. 5-(3"-Aminopropynyl)-2'-deoxyuridine ("J"), a modified nucleoside with a side chain 
carrying a cationic functional group. 

Figure 2. Results of parallel polymerase chain reaction (PCR) amplifications of an oligonucleotide 
library incorporating the natural nucleoside triphosphates (ATP, CTP, GTP and TTP, bottom) or a 
functionalized mix of ATP, CTP, GTP, and the triphosphate of 5-(3"-aminopropynyl)-2’-deoxyuridine 
(J, top). Left and right hand lanes show molecular weight markers, while the middle lanes are for the 
cycle number indicated. An extended 5’-primer (51-mer, with the sequence 
CCGATTGAATCCTAGATCGCATGCTACTGATGACTGTGTAAGCTTGAGCAT) was used to 
create a gel mobility shift to distinguish the amplified template from the non-amplified template (note 
the double band in early cycles). The PCR conditions were as follows: 0.05 U/pL Vent polymerase, 
260 nM each of dCTP, dATP, dGTP and TTP (bottom) or JTP (top), 4.7 jiM primers, 1 .6 nM 
template, 10 mM KC1, 10 mM (NH4)2S04, 20 mM Tris-HCl, 2 mM MgSC>4, total volume 0.2 mL. 
Aliquots (10 pL) were removed after each cycle and loaded onto an agarose gel (4%), together with 
markers (50 base pairs, left and right lanes). The PCR cycles consisted of the following steps: 94°C (1 
min), 55°C (2 min), 72°C (2 min). 

Figure 3. Two motifs that bind ATP. Motif A is the wild type sequence reported by Huizenga and 
Szostak 15 as a typical ATP-binding motif obtained via in vitro selection from a standard, 
unfunctionalized DNA library. Motif B is a representative sequence obtained via a parallel in vitro 
selection experiment where the triphosphate of 5-(3"-aminopropynyl)-2'-deoxyuridine (J, Figure 1), a 
modified nucleoside with a side chain carrying a cationic functional group, is incorporated in the 
selection experiment instead of thymidine triphosphate. The motif is underlined, with the flanking 
regions included (not underlined). 
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protected nucleoside derivative was prepared as follows. 5-(3-Trifluoroacetamidopropyn- 
l-yl)-2’-deoxyuridine. C14H14F3N3O6 5-Iodo-2‘-deoxyuridine (Sigma, 1.412 mmol, 

500 mg) was dissolved in dry DMF (12 mL). Ar was passed through this solution for 10 min. 
(Ph3P)4Pd (0. 1 equiv., 0. 141 mmol, 163 mg) was added and Ar was passed through the 
solution for another 5 min. Triethylamine (2.0 equiv., 2.825 mmol, 285 mg, 0.393 mL) was 
added via syringe followed by addition of N-propynyltrifluoroacetamide 1 1 (2.5 equiv, 3.53 1 
mmol, 533 mg) and Cul (0.2 equiv., 0.282 mmol, 53.7 mg). The mixture was stirred at 40°C 
for 5 h, the solvent evaporated, and the residue dissolved in MeOH/methylene chloride 1:1 (10 
mL). Ion exchange resin Bio-Rad AG1 X8 (HC03* form, 1.5 g, prepared from the chloride 
form by eluting through a column with the 16 fold volume of 1M NH4HCO3 solution followed 
by deionized water and finally with 0.5 M NH4HCO3 solution; no Cl* was detected) was added 
to remove the Et3N»HI by-product, and the mixture stirred at room temperature for 30 min. The 
mixture was filtered through Celite, the solid washed with MeOH/methylene chloride 1:1 (10 
mL), and the combined filtrates evaporated. The residue was purified by column 
chromatography (chloroform/MeOH 8.25:1.75). R F : 0.42 (chloroform/MeOH 8.25:1.75). 

Yield: >95% yellow foam (contains DMF). l H-NMR (DMSO-d6): 2.13 (m, 2H, 2’), 3.60 (m, 
2H, 5’), 3.81 (m, 1H, 4’), 4.24 (m, 3H, H-9, 3’), 5.12 (t, 1H, 5’-OH), 5.27 (d, 1H, 3’-OH), 
6.11 (t, 1H, I*), 8.22 (s, 1H, H-6), 10.09 (t, 1H, NH chain), 11.67 (s, 1H, NH cycl.). 13 C - 
NMR (DMSO-d6): 29.5 (C-9), 40.7 (2’), 61.0 (5’), 70.2 (3’), 75.4 (C-8), 84.8 (1’), 87.5, 

87.7 (4’, C-7), 97.7 (C-5), 113.9, 117.8 (q, CF 3 , J = 287.95 Hz), 144.2 (C-6), 149.5 (C-2), 
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155.9, 156.4 (q, £QCF 3 , J = 37.25 Hz), 161.7 (C-4). 5 -( 3 -Trifluoroacetamidopropyn- 
l-yl)-5’-0-dimethoxytrityl-2’-deoxyuridine C 35 H 32 F 3 N 3 O 8 . The product from above 
(1.342 mmol, 505.9 mg) was coevaporated with pyridine, then dissolved in dry pyridine (10 
ml) and cooled to 0°C. Et 3 N (2 equiv, 2.684 mmol, 271.1 mg, 0.373 mL), DMAP (0.25 
equiv, 0.3355 mol, 41 mg) and DMTC1 (1.2 equiv, 1.61 mmol, 545.1 mg) were added and the 
mixture stirred at 0°C for 5 min. and at room temperature for 4 h. TLC (chloroform/ 10% 
MeOH, Rp = 0.47) did not show any starting material. MeOH (2 ml) was added and the 
mixture evaporated. The residue was extracted (ethyl acetate/aqueous NaHC0 3 solution), the 
combined organic layers washed with water, dried (Na2S04) and the solvent evaporated. The 
residue was purified by column chromatography (chloroform/ 10% MeOH) to give 902 mg 
(99%) of product as a yellow foam. 'H-NMR (CDC1 3 ): 2.44-2.61 (m, 2H, 2’), 3.34 (m, 2H, 
5’), 3.73 (s, 6H, MeO), 3.89 (m, 1H, 4’), 4.14 (m, 2H, H-9), 4.59 (m, 1H, 3’), 6.34 (t, 1H, 
1’), 680 ( m * 4H ’ DMT), 7.14-7.33 , 7.61-7.70 (m, 9H, DMT), 8.21 (s, 1H, H-6). !H-NMR 
ref. [1] (DMS0-d6/D 2 0): 2.28 (m, 2H, 2’), 3.08 (m, 1H, 5’), 3.28 (m, 1H, 5’), 3.91 (m, 1H, 
4’), 4.06 (br s, 2H, H-9), 4.33 (m, 1H, 3’), 6.10 (t, 1H, 1’), 6.90 (m, 4H, DMT), 7.22-7.42 
(m, 9H, DMT), 7.94 (s, 1H, H-6).'3C-NMR (CDC1 3 ): 30.3 (C-9), 41.6 (2’), 55.2 (MeO), 63.5 
(5’), 72.0 (3’), 75.3 (C-8), 86.0 (F), 86.9 (Cq trityl), 87.0, (C-7), 87.3 (4’), 99.9 (C-5), 

113.3 (DMT), 113.8, 117.6 (q, CF 3 , J = 286.52 Hz), 126.9, 127.8, 128.0, 129.9, 135.4 (all 
DMT), 143.6 (C-6), 144.5 (DMT), 149.4 (C-2), 156.4, 156.9 (q, COCF r J = 37.70 Hz), 
158.5 (DMT), 162.6 (C-4). 3’*0-Acetyl-5-(3-trifluoroacetamidopropyn>l-yl)-5’*0- 
dimethoxytrityI-2’-deoxyuridine C37H34F3N3O8 . The product from above (1.328 
mmol, 902 mg) was coevaporated with pyridine and then dissolved in pyridine (10 mL) and 
DMAP (0.25 equiv, 0.332 mmol, 40.5 mg), Et 3 N (2.5 equiv, 3.32 mmol, 335 mg, 0.462 ml) 
and Ac 2 0 ( 1 .2 equiv, 1 .594 mmol, 162.5 mg, 0. 15 mL) were added. It was stirred at room 
temperature for 2.5 h (TLC: chloroform/ 10% MeOH, R F = 0.75), then MeOH (5 ml) was 
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added to stop the reaction and it was evaporated to dryness. The residue was extracted 
(water/ethyl acetate), the organic layer dried and the solvent evaporated. The residue 

was purified by column chromatography (chloroform/ 10% MeOH) to give 937 mg (98%) of 
product as a yellow foam. 'H-NMR (CDC1 3 ): 2.08 (s, 3H, Ac), 2.38-2.64 (m, 2H, 2’), 3.42 
(m, 2H, 5’), 3.77 (s, 6H, MeO), 3.96 (m, 1H, 4’), 4.18 (m, 2H, H-9), 5.45 (m, 1H, 3’), 
6.34 (t, 1H, 1’), 6.85 (m, 4H, DMT), 7.23-7.48 , 7.63-7.71 (m, 9H, DMT), 8.20 (s, 1H, H- 
6). I3 C-NMR (CDC1 3 ): 20.8 (Ac), 30.2 (C-9), 38.7 (2’), 55.1 (MeO), 63.5 (5’), 74.9 (3’), 

75.1 (C-8), 84.4 (I’), 85.3 (4’), 87.1 (Cq trityl), 87.2 (C-7), 99.4 (C-5), 113.3 (DMT), 

113.6, 117.4 (q, CF 3 , J = 287.42 Hz), 126.9, 127.7, 128.0, 129.9, 135.2 (all DMT), 143.1 
(C-6), 144.3 (DMT), 149.4 (C-2), 156.3, 156.8 (q, COCF 3 . J = 37.78 Hz), 158.6 (DMT), 
162.0 (C-4), 170.3 (Ac). 

13. Ludwig, J.; Eckstein, F. /. Org. Chem. 1989, 54, 631-635. 

14. (a) Horlacher, J.; Hottiger, M.; Podust, V.N.; Hubscher, U.; Benner, S.A. Proc. Natl. Acad. 
Sci. USA 1995 , 92, 6329-6333. (b) Lutz, M.J.; Held, H.A.; Hottiger, M.; Hiibscher, U.; 
Benner, S.A. Nucl. Acids Res. 1996 , 24, 1308-1313. 

15. Huizenga, D.E.; Szostak, J.W. Biochemistry 1995 , 34, 656-665. 

16. A library of DNA sequences was built to have 22-mer and 20-mer primer-binding regions 
sandwiching a random 70 base region containing the four natural bases in equal proportion. 

This was PCR amplified using JTP in place of TTP, with one primer tagged with biotin. The 
resulting double-stranded DNA library containing A, G, C, and J (J-DNA) was converted to the 
corresponding single stranded J-DNA library by loading the product on a streptavidin agarose 
column and eluting with 0.5 M NaOH. A preselection of the library was first performed against 
6-aminohexanoic acid TV-hydroxysuccinimide ester-Sepharose 4B quenched (0. 1 M Tris, pH 
8.0, 0.5 M NaCl) and then equilibrated in selection buffer (250 mM NaCl, 50 mM Tris, pH 
7.6, 5 mM MgCl 2 ). The resulting library was then applied to the ATP-agarose column (N-8 
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linkage) and non-binding species were eluted with selection buffer. Binding components were 
then eluted with ATP, ADP or AMP (respectively) in selection buffer. The binding species were 
PCR amplified to complete the first round of selection. The second round of selection followed 
with the application of the amplified binding species to the ATP-agarose column and continuing 
the iterative process. The resulting DNA was cloned into the vector pCR 2.1-TOPO using the 
TOPO TA Cloning Kit (Invitrogen). For the standard DNA pools 10 clones were sequenced per 
selection experiment, and for the J-DNA pools 15 clones were sequenced. The randomized 
region of the J-containing sequences had been noticeably shortened during the selection, its 
average length after nine rounds of selection was 26 nucleotides compared to 70 nucleotides in 
the original library. Also, the content of the J nucleotide was reduced; instead of the expected 
25%, it was only about 14%. This result suggests biases against longer oligonucleotide 
aptamers and J, although these results do not permit a conclusion as to whether these biases 
arise in the selection or the amplification steps. 

17. Sassanfar, M. and Szostak, J. W. Nature 1993, 364, 550-553. 
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Motif A 

GTGCTTGGGGGAGTATTGCGGA 

GGAAAGCGGCCCTGCTGAAG 

Motif B 

GGTCGTCTAGAGTATGCGGTAG 
GA AGO JCAO JGGGGGGAGCAJA 
JGGJGJGAJACGCGA 
CCGAAGAAGC J JGGCCCA J G 
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SINGLE BIOPOLYMER LIFE FORMS BASED ON RNA 

In terms of its macromolecular chemistry, life on Earth can be classified as a "two-biopolymer" 
system. Nucleic acid is the encoding biopolymer, storing information within an organism and 
passing it to its descendants. Nucleic acids also direct the biosynthesis of the second biopolymer, 
proteins. Proteins generate most of the selectable traits in contemporary organisms, from structure to 
motion to catalysis. 

The two-biopolymer strategy evidently works rather well. It has lasted on Earth for several 
billion years, adapting in this time to a remarkable range of environments, surviving formidable 
geobiological (and perhaps cosmic) events that threatened its extinction, and generating intelligence 
capable of exploring beyond Earth. 

The terrestrial version of two-biopolymer life contains a well recognized paradox, however, one 
relating to its origins. It is difficult enough to envision a non-biological mechanism that would allow 
either proteins or nucleic acids to emerge spontaneously from non-living precursors. But it seems 
astronomically improbable that both biopolymers arose simultaneously and spontaneously, and even 
more improbable (if that can be imagined) that both biopolymers so arose with an encoder-encoded 
relationship. 

Accordingly, a variety of "single-biopolymer" models have been proposed as forms of life that 
antedated the two-biopolymer system. These (presumably) could have emerged more easily than a 
two biopolymer system. Such models postulate that a single biopolymer can perform the catalytic 
and information repository roles and undergo the Darwinian evolution that defines life (Joyce 1994). 
For example. Rich (1962), Woese (1967), Orgel (1968), and Crick (1968) proposed that the first 
biopolymeric system that sustained Darwinian evolution on Earth was RNA. Usher (1976), White 
(1976), Visser and Kellogg (1978) and Benner et al. (1989) expanded on this proposal, recognizing 
that key elements of contemporary metabolism might be viewed as vestiges of an "RNA World" 
(Gilbert 1986), a time when the only encoded component of biological catalysis was RNA. The 
phenomenal discoveries by Cech, Altman, and their coworkers (Cech et al. 1981; Zaug et al. 1996; 
Guerrier-Takada et al. 1983) showing that RNA performs catalytic functions in contemporary 
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organisms has made the "RNA World” a part of the culture of contemporary molecular biology 
(Watson et al. 1987). 

The notion that the RNA World was metabolically complex follows from the abundance of its 
vestiges in modem metabolism (Benner 1988; Benner et al. 1989). RNA fragments play roles in 
modem metabolism for which they are not intrinsically chemically suited, most notably in "RNA 
cofactors" such as ATP, coenzyme A, NADH, FAD, and S-adenosylmethionine. This suggests that 
these fragments originated during a time in natural history where RNA was the only available 
biopolymer, rather than by convergent evolution or recruitment in an environment where chemically 
better suited biomolecules could be encoded. If the RNA World developed ATP, coenzyme A, 
NADH, and S-adenosylmethionine, it follows that the RNA World needed these for some purpose, 
presumably for phosphorylations, Claisen condensations, oxidation-reduction reactions, and methyl 
transfers (respectively) (White 1976; Visser and Kellogg 1978; Benner et al. 1989). This in turn 
implies complexity in the metabolism encoded by RNA-based life, implying in turn that RNA can 
catalyze a wide variety of chemical reactions. Conversely, the intellectual contribution of the "RNA 
World" model would be diminished were it not to embody a complex metabolism catalyzed by 
ribozymes, as there would then be no coherent explanation for the structures of contemporary RNA 
cofactors. 

Accordingly, hopes were high when Szostak (Szostak 1988), Joyce (1989a,b), Gold (Irvine et al. 
1991) and their coworkers introduced "in vitro selection" as a combinatorial tool to identify RNA 
molecules within a pool that catalyze specific reactions. Elegantly conceived, the approach seemed 
likely to lead to the ultimate goal, the generation of an RNA (or DNA) molecule that would catalyze 
the template-directed polymerization of RNA (or DNA), a molecular system able to undergo 
Darwinian evolution. If selection procedures were appropriately designed, they should also produce 
RNA catalysts for almost any other reaction as well, at least if the "RNA World" model as 
elaborated above were a correct representation of natural history. 

LIMITATIONS OF RNA AS A CATALYST 
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In contrast with these hopes (and only by this contrast), in vitro selection has been disappointing. 
RNA has proven to be an intrinsically poor matrix for obtaining catalysis, especially when compared 
with proteins. For example, to have a 50% chance of obtaining a single RNA molecule capable of 
catalyzing a template-directed ligation reaction by a modest (by protein standards) factor of 10,000, 
Bartel and Szostak estimated that one must sift through 2 x 10 13 random RNA sequences 220 
nucleotides in length (Bartel & Szostak 1993). Although many laboratories have tried, only a few 
have managed to extend the scope of RNA catalysis beyond the phosphate transesterification 
reactions where it was originally observed. For example, attempts to obtain an RNA catalyst for a 
Diels- Alder reaction using in vitro selection failed (Morris et al. 1994); the same reaction is readily 
catalyzed by protein antibodies (Gouvemeur et al. 1993). Attempts to obtain RNA that catalyzes 
amide synthesis have succeeded, but with difficulty (Zhang and Cech 1997; Wiegand et al. 1997). 
The fact that such successes came only after many attempts is indicative of a relatively poor 
catalytic potential in oligonucleotides. 

The comparison with peptides is instructive. For example, short (14 amino acids) peptides 
accelerate the rate determining step for the amine-catalyzed decarboxylation of oxaloacetate by 
more than three orders of magnitude (Johnsson 1990; Johnsson et al. 1993), not far below the 
acceleration observed for the first generation ligases observed in the Bartel-Szostak selection 
beginning with 10 13 random RNA sequences. Further, the peptide is less than 10% the size of the 
RNA motif. Combinatorial experiments starting from this design (Perezpaya et al. 1996; L. Baltzer, 
personal communication) suggested that perhaps only 10 7 random sequences must be searched to 
get a similar catalytic effectiveness as is observed in a library of 10 13 RNA molecules. This suggests 
that peptides are intrinsically a million fold fitter as catalysts than RNA. 

The comparison is imperfect, of course, as it involves different reactions and different design 
strategies. This imperfection characterizes most of the comparisons that can be made at present. Not 
surprisingly, ribozymes are most frequently sought for reactions where oligonucleotides are most 
likely to be effective catalysts (for example, where oligonucleotides themselves are substrates), 
while peptide catalysts are most frequently sought for reactions suited for peptide catalysts (for 
example, those that make use of functional groups found on amino acid side chains). This makes the 
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comparison non-quantitative, but useful nevertheless as an estimate of how well oligonucleotides 
and oligopeptides respectively perform when challenged by their favorite target reactions. 

THE CHEMISTRY OF FUNCTIONAL CATALYSIS 

The apparent superiority of proteins as catalysts compared with RNA reflects (at the very least) 
the availability to proteins of a wider range of building blocks and catalytic functionality available 
than in RNA. RNA lacks the imidazole, thiol, amino, carboxylate, and hydrophobic aromatic and 
aliphatic groups that feature so prominently in protein-based enzymes. RNA has only hydroxyl 
groups, polar aromatic groups, and phosphate groups. An uncounted number of studies with natural 
enzymes and their models has illustrated the use of this functionality by protein catalysts (Dugas 
1989). 

Proteins also have advantages as catalysts over nucleic acids in their greater propensity to "fold". 
As is well known from the statistical mechanics of polymers, the repeating negative charge of the 
polynucleotide backbone causes the polymer to favor an extended structure (Flory, 1953; Richert et 
al. 1996). Accordingly, the most prominent physical characteristics of nucleic acids are their 
solubilities in water, their ability to bind other oligonucleotides following simple rules, and 
constancy of physical behavior over a wide range of sequences. In contrast, the most prominent 
physical characteristic of peptides is their propensity to fold, best known as a propensity to 
precipitate (which is, of course, a type of "folding", in that peptide interacts with peptide rather than 
with water). A catalyst must fold if it is to surround a transition state and be an effective, providing 
another reason why peptides might be intrinsically better catalysts than RNA (Benner 1989). 

If one must generate trillions of long, random RNA sequences in order to have a 50% likelihood 
of finding one that catalyzes even modestly a simple ligation (a reaction that itself assumes the pre- 
existence of long RNA molecules that act as templates and substrates) how many more random 
sequences must be generated to obtain a template-directed RNA polymerase? We cannot say, as 
such a ribozyme has not been generated. An optimistic guess is 10 2 °. This, the difficulty of 
obtaining plausible prebiotic syntheses of RNA molecules (but see Muller et al. 1990), and the 
observation that racemic mixtures of RNA do not effectively undergo abiological polymerization 
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(see, for example, Schmidt et al. 1997) have prompted many to question the RNA World as a viable 
model for generating the first life on earth (Joyce et al. 1987; Miller 1997). The critique 
acknowledges the premise that single biopolymer system is more plausible as a first life form than 
two biopolymer system. It continues, however, by holding that the chemical properties of RNA are 
such that it could not have been the first living biopolymer, as it is too difficult to generate under 
abiotic conditions and provides too little catalytic power even if it could be generated. 

EXPANDING THE STRUCTURAL REPERTOIRE OF NUCLEIC ACIDS 

A decade ago, the intrinsic limitations of standard nucleic acids as a biopolymer for obtaining 
functional behavior under conditions of Darwinian selection were discussed, and several solutions to 
these limitations proposed (Benner et al., 1987; Benner 1988; Benner 1989; Switzer at al. 1989; 
Piccirilli et al. 1990). Each of these involved an expedient by which additional functionality was 
provided to the RNA. 

One expedient was obvious. RNA might gain functionality using cofactors, much as 
contemporary proteins gain the functionality that they lack through vitamins. 

A second solution was to append functionality to the standard nucleotides. Prompting this 
suggestion was the observation that contemporary tRNA and rRNA contain much of the 
functionality found in proteins but lacking in contemporary encoded RNA, including amino, 
carboxylate, and aliphatic hydrophobic groups (Figure 1) (Limbach et al. 1994). These functional 
groups are introduced by post-transcriptional modification of encoded RNA. Some of these might 
even be placed by parsimony in the protogenome, the reconstructable genome at the trifurcation in 
the evolutionary tree joining the archaebacterial, eubacterial, and eukaryotic kingdoms (Benner et al. 
1989; Limbach et al. 1994). 

The third approach to expand the functional diversity of nucleic acids pursued the possibility of 
expanding the number of base pairs from the four found in standard oligonucleotides to include 
some of the non-standard hydrogen bonding patterns permitted by the geometry of the Watson-Crick 
base pair (Figure 2) (Switzer et al., 1989; Piccirilli et al. 1990). Additional letters in the genetic 
alphabet could carry a richer diversity of functionality. Indeed, one might imagine a new type of 
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biopolymer, one carrying functionalization like proteins but able to be copied like nucleic acids 
(Figure 3) (Kodra and Benner 1997). 

In a sense, the first approach had already been implemented in 1987. Most ribozymes require one 
or more metal ions to be effective catalysts. The metal ions are not encoded in the RNA sequence, 
provide a needed electrophilic center, and therefore compensate for the limited catalytic 
functionality of the biopolymer itself. Thus, metals can be considered to be "cofactors", and clearly 
improve the catalytic functionality of RNA. More recently. Breaker and his coworkers have 
expanded the approach to include organic molecules as second ligands in riboenzymes (Tang & 
Breaker 1997). 

In contrast, the second and third approaches were far from implementation in 1987. While 
standard bases carrying functionality were known to form stable base pairs and, in some cases, be 
accepted by polymerases (Prober et al. 1987), it was not clear that non-standard bases would pair as 
expected, or whether polymerases would incorporate functionalized standard bases and non-standard 
bases (Figures 2 and 3) with sufficient speed and fidelity to be used in in vitro selection 
experiments. Further, it was not known whether in vitro selection based on an expanded genetic 
alphabet might improve the binding and catalytic versatility of RNA. 

Developing in vitro selection with an expanded genetic alphabet proved to be more difficult than 
developing in vitro selection with the standard nucleotides (A, T, G and C), which was enabled by a 
rich collection of molecular biological tools. Non-standard nucleobases needed to be synthesized 
(Switzer et al. 1989; Piccirilli et al. 1990; Vogel et al. 1993a; Vogel et al. 1994). Their structures 
needed to be optimized for stability and pairing (Piccirilli et al. 1991; Vogel et al. 1993b). New 
protecting group chemistry needed to be developed to permit automated synthesis of 
oligonucleotides containing them (Huang and Benner 1993; von Krosigk and Benner 1995). 
Polymerases were needed to catalyze their incorporation into oligonucleotides by the polymerase 
chain reaction (Horlacher et al. 1995; Lutz et al. 1996). These studies have been paralleled by work 
to append still more functionality onto standard nucleobases (Dewey et al. 1996; Kodra and Benner, 
1997). These experiments have established the chemistry of both functionalized standard and non- 
standard nucleotides, and laid the ground for the first in vitro selection experiments using these. 


7 



THE RNA WORLD HAD THE MOTIVE TO EXPLOIT MODIFIED NUCLEOTIDES 

With these chemical developments, it has been possible recently to make a convincing, if not 
compelling, argument that the RNA World had both the motive and the opportunity to exploit non- 
standard and functionalized nucleobases. Three results are central to this argument. 

First, functionality has been incorporated into an RNA molecule that catalyzes a Diels- Alder 
reaction (Tarasow et al. 1997), starting from a functionalized standard pyADA nucleobase (Figure 4, 
right). A selection starting with a library that did not contain functionalized nucleotides failed to 
yield a catalyst (Morris et al. 1994). The successful experiment with the functionalized pyADA base 
selected directly for a Diels- Alderase, however, while the experiment on the unfunctionalized library 
sought a Diels- Alderase by selecting for RNA molecules that bound to a transition state analog for 
the reaction. The different selection strategies prevent us from saying conclusively that this 
particular functionalized nucleoside improves the intrinsic power of RNA as a catalyst for Diels- 
Alder reactions. Experiments that bear on this question will undoubtedly emerge soon. 

Another functionalized selection experiment does support this conclusion. Burgstaller, Jurczyk, 
Battersby, and Benner prepared a different functionalized implementation of the pyADA nucleobase 
(trivially designated "J", Figure 4) and incorporated it into an in vitro selection experiment seeking 
receptors for adenosine derivative (unpublished). This experiment was done in strict parallel with 
experiments done by (Huizenga and Szostak (1995) using a standard, unfunctionalized DNA library. 

The functionalized library containing J yielded new motifs as receptors for ATP, including the 
following (the randomized region is underlined): 

GGTCGTCTAGAGTATGCGGTAG GAACG J CAG J GGGGGGAGCA JA JGG J G J GA JACGCGA CCGAAGAAGC J JGGCCCA JG 

The motif prepared with unfunctionalized T replacing J does not bind ATP, suggesting that the 
ammonium functionality carried by J is essential for the binding properties of the new motif. A gel 
filtration experiment was used to obtain an equilibrium binding constant (K<j) of 40 nM for affinity 
of this aptamer and ATP. This value is approximately 2 orders of magnitude greater than the 
reported Kd for the binding of RNA (Sassanfar & Szostak 1993) and DNA aptamers containing only 
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standard bases to ATP (Huizenga and Szostak, 1995). With the caveats that elution experiments 
permit only estimates of binding constants, and that further experiments with a wider range of 
ligands must be completed, these results suggest that introduction of a new functionality (an 
ammonium group bearing a positive charge) enhances the intrinsic value of a DNA library as a 
source of receptors by ca. two orders of magnitude. 

These experiments make clear that functionalized oligonucleotides are superior to standard 
oligonucleotides as a matrix for generating receptors and catalysts. This would have given the RNA 
World a motivation to have used functionalized oligonucleotides and an expanded genetic alphabet 
in its effort to generate diverse catalysts. 

But did it? The third result comes from the field of "prebiotic chemistry", which seeks to 
discover ways by which the components of living systems might have emerged in the early earth. 
Robertson and Miller (1995) showed how the intrinsic nucleophilicity of the 5-position of 
pyrimidines such as uracil might be exploited to generate functionalized uracil derivatives that carry 
positive charges at the 5-position under abiological conditions. Analogous chemistry can be used to 
generate other functionalized derivatives. The products resemble the amino group functionalized 
uracils found in some tRNA molecules (Figure 1). This suggests that the RNA World may have had 
the opportunity to use some functionalized nucleosides when life first emerged on Earth. 

Could non-standard nucleobases (Figure 2) also have been available during early episodes of life 
on Earth? The success of prebiotic chemists in generating organic species under " prebiologic al" 
conditions has expanded greatly the spectrum of molecules that might have been accessible to early 
life. Indeed, prebiotic chemistry might have been too successful, in that relatively simple prebiotic 
models can generate organic mixtures containing perhaps too many products (Khare et al. 1993). 
Contemporary prebiotic chemistry must become less an effort to show that a given moiety might be 
generated under prebiotic conditions, and more an effort to show how a useful moiety (such as a 
heterocycle or a ribose) arising under prebiotic conditions might be converted into one or more of its 
delicate derivative (such as nucleosides) in the presence of organic "gunk" that emerges from a 
typical prebiotic experiment. 
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Notwithstanding these issues, several of the non-standard nucleobases in Figure 2 do not appear 
to be less "prebiotic" than the standard nucleobases. The puADA nucleobase is, for example, a 
simple deamination product of the puADD base (also known as guanine). Thus, if guanine was 
generated on a prebiotic earth, puADA was a fortiori also generated on a prebiotic earth. Similar 
arguments can be made for the puDDA and pyAAD nucleobases. This suggests that if the RNA 
World had the opportunity to use the standard genetic alphabet, then it may also have had the 
opportunity to use an expanded genetic alphabet. 

CONTRADICTING CHEMICAL REQUIREMENTS FOR CATALYSIS AND 
INFORMATION STORAGE 

This evidence suggests that the RNA World had both access to a functionalized and/or expanded 
genetic alphabet and the motivation to use it. The case is made stronger by the functionalized 
nucleotides found in contemporary tRNA and rRNA (Figure 1), presuming that these are vestiges of 
an RNA World. 

Even assuming that further experimental work demonstrates the full catalytic potential of 
functionalized and expanded genetic alphabets, it is still not clear that they will support single- 
biopolymer systems of life, however. To support a self-sustaining chemical system capable of 
undergoing Darwinian evolution (Joyce 1994), a biopolymer must be able to search mutation-space 
independent of concern that it will lose properties essential for replication. We designate polymers 
that have this property as COSMIC-LOPER biopolymers ("Capable of Searching Mutation-space 
Independent of Concern over Loss Of Properties Essential for Replication"), and comment briefly 
on the chemical constraints placed upon biopolymers likely to have this property. 

The need for the single biopolymer to be COSMIC-LOPER to support Darwinian evolution is 
nearly axiomatic. If a substantial fraction of the mutations possible within a genetic information 
system cause a biopolymer to precipitate, unfold, or otherwise no longer be recognizable by the 
catalyst responsible for replication, then the biopolymer cannot evolve. 
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Curiously, catalysis on one hand and information storage on the other place competing and 
contradictory demands on molecular structure that make a single molecule that does both difficult to 
find. Specifically: 

1. A biopolymer specialized to be a catalyst must have many building blocks, so that it can display 
a rich versatility of chemical reactivity. A biopolymer specialized to store information must have 
few building blocks, as a way of ensuring faithful replication (Szathmary 1992; Lutz et al. 1996). 

2. A biopolymer specialized to be a catalyst must fold easily so that it can form an active site. A 
biopolymer specialized to store information should not fold easily, so that it can serve as a 
template. 

3. A biopolymer specialized for catalysis must be able to change its physical properties rapidly with 
few changes in its sequence, enabling it to explore "function space" during divergent evolution. 

A biopolymer specialized to encode information must be COSMIC -LOPER, with its physical 
properties largely unchanged even after substantial change in its sequence, so that the polymer 
remains acceptable to the mechanisms by which it is replicated. 

At the very least, a single biopolymer attempting to support Darwinian evolution must reflect 
some sort of structural compromise between these goals. No fundamental principle guarantees that a 
polymeric system will make this compromise in a satisfactory way, however. The demands for 
functional diversity, folding, and rapid search of function space might be so stringent, and the 
demands for few building blocks, templating ability, and COSMIC-LOPER ability so stringent, that 
no biopolymer structure achieves a suitable compromise. 

Nor need a biopolymer exist that supports robust catalysis at the same time as it enables robust 
Darwinian evolution. If so, the single-biopolymer model for the origin of life would be unavailable 
as a solution to the "chicken-or-egg" paradox in the origin of two-biopolymer systems. Life would 
be scarce in the universe. And if a single biopolymer system did arise, it would be poorly adaptable 
and easily extinguished by geobiological (and possibly cosmogenic) events. Conversely, if many 
polymeric systems exist that make an acceptable compromise between the demands of catalysis and 
the demands of information storage, life would have emerged rapidly via single-biopolymer forms 
and be abundant in the universe. 
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It is clear that proteins are not COSMIC polymers, even in cases where they can direct template- 
based replication (Lee et al. 1997). The physical properties of proteins (including their solubility) 
can change dramatically upon point mutation within the mutation space allowed by the 20 standard 
amino acids. Again, there are many examples of this phenomenon, but the peptides mentioned above 
that catalyze the decarboxylation of oxaloacetate are one. Altering their structure by a single acetyl 
group changes substantially their level of aggregation, while altering their internal sequence at a 
single residue changes substantially their helicity (Allemann 1989; Johnsson 1990; Johnsson et al. 
1993). If solubility and/or helicity are essential to the replicatability of a peptide template, a large 
range of plausible mutation would destroy it. 

Natural oligonucleotides do not behave similarly. Indeed, molecular biologists rely on this fact. 
Every (or almost every) oligonucleotide will precipitate in ethanol. Every (or almost every, if we 
consider G-rich sequences (Wang and Patel 1994)) oligonucleotide will bind to its complement in a 
rule-based fashion. Every (or almost every) oligonucleotide will be a template for a polymerase. 
Every (or almost every) oligonucleotide will migrate as expected on an electrophoresis gel. This 
regularity is normal for oligonucleotides, but is exceptional for virtually every other class of organic 
molecule. 

Even small steps taken from the natural backbone can destroy the COSMIC-LOPER properties 
of oligonucleotides. For example, work recently replaced the phosphate diester linkers in DNA and 
RNA by non-ionic dimethylenesulfone linking units (Huang et al. 1991). The sulfone group is an 
"isosteric" and "isoelectronic" replacement for a phosphate. Nevertheless, these non-ionic oligomers 
display some remarkable properties. First, they fold. For example, the octamer 
ASO 2 USO 2 GSO 2 GSO 2 USO 2 CSO 2 ASO 2 U folds in solution to give a folded form in water having a 
high melting temperature (ca. 87 °C) (Richert et al. 1996). Next, a synthetic intermediate leading to 
this oligosulfone was found to be a "catalyst" for a self-debenzoylation reaction (Richert et al. 

1996). Still more remarkably, different oligosulfones evidently follow different strategies for folding 
and pairing. The dinucleotide analog GSO 2 C in the crystal forms an antiparallel duplex 
approximately isomorphous with the analogous RNA (Roughton et al. 1995). In the crystal, the 
ASO 2 T dinucleotide does not (Hyrup et al. 1995). The USO 2 C dinucleotide forms a complex 
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featuring backbone-to-backbone and backbone-to-nucleobase hydrogen bonds (C. Richert, personal 
communication). Even within a relatively small search of sequence space, these non-ionic 
oligonucleotide analogs retain no conformational or physical property that could be a ready basis for 
a common mechanism for replication. In this respect, oligosulfone analogs of DNA and RNA 
behave much the same as peptides and conventional small organic molecules, not the nucleic acids 
upon which they are modelled. 

These results suggest that the need for a COSMIC-LOPER behavior is a strong constraint on 
what biopolymers might serve as the basis for single-biopolymer life. They also suggest that a 
polyelectrolyte (polyanion or polycation) structure is important for the COSMIC-LOPER behavior 
that we see in standard nucleic acids (Richert et al. 1996): 

(a) Phosphate groups force the interaction surface between strands as far distant from the 
backbone as possible, to the Watson-Crick "edge" of the nucleobases. Without interstrand 
phosphate-phosphate repulsion, sugar-sugar interstrand interactions, sugar-backbone interstrand 
interactions, interactions between the sugar and backbone groups of one strand and the Hoogsteen 
edge of the nucleobases on the other, Hoogsteen-Hoogsteen interstrand interactions, and Watson 
Crick-Hoogsteen interstrand interactions all become important, and the recognition phenomenon 
ceases to be rule-based. 

(b) Phosphates discourage folding in an oligonucleotide molecule. The statistical mechanical 
theory of polymers suggests that the polyanionic backbone will cause natural oligonucleotides to 
adopt an extended structure (Flory 1953; Brant and Flory 1965). Non-ionic oligonucleotide analogs 
should (and do) fold like peptides. By discouraging folding, the repeating polyanionic backbone 
helps oligonucleotides act as templates. 

(c) Electronic distribution in a molecule is described as an infinite series (monopole + dipole + 
quadrapole + ...). The first non-vanishing term dominates. The repeating monopole (charge) in DNA 
makes dipolar interactions (hydrogen bonding) secondary to its properties, allowing the DNA 
molecule to mutate without changing greatly its physical behavior. 

Returning to functionalized and expanded genetic alphabets (Figures), this discussion suggests 
that one must be careful when "decorating" oligonucleotides with functionality. At some level of 
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functionalization, the COSMIC-LOPER properties that enable DNA and RNA to serve as an 
evolvable Darwinian system will be lost. Preliminary data suggest, for example, that extensive 
functionalization with hydrophobic side chains destroys these properties. It remains to be seen 
whether the level of functionality that must be introduced into DNA and RNA to enable it to support 
a complicated metabolism is greater than that required to destroy its COSMIC-LOPER properties. 

CAN A SINGLE-BIOPOLYMER LIFE BE FOUND TODAY IN THE SOLAR SYSTEM? 

"Single-biopolymer" models for Darwinian chemistry have relevance to the search for 
extraterrestrial life. For example, biologists have noted that the microfossils in the Allan Hills 
meteorite, which are as small as 20-100 nanometers across, are too small to be living cells (Kerr 
1997). After all, the argument is made, the ribosome is 25 nm across, and ribosomes are a basic 
requirement for life. 

This argument is, of course, narrowly formulated. Ribosomes are a basic requirement for life 
based on two biopolymers. If a single biopolymer (such as RNA) can serve both genetic and 
catalytic functions, then ribosomes are not needed for life. Indeed, much of the metabolism of 
contemporary cells (aminoacyl tRNA synthetases and many amino acid biosynthesis enzymes, for 
example) comprising more than half of what is believed to be the "core metabolism" encoded by the 
protogenome (Benner et al. 1993) would also not needed for life in an RNA World. A cell based on 
a "single biopolymer" genetic system can be far smaller than one based on two biopolymers. This 
means that the "fossils" in the Martian meteorite are not too small to be remnants of a single- 
biopolymer form of life. Conversely, if the meteorite structures are indeed fossils, then they almost 
certainly are fossils of an organism that used only a single biopolymer as its molecular system 
capable of Darwinian evolution, and similar considerations should guide our search for non-terrean 
life. 

The best place to search for single-biopolymer life may be here on Earth, however, assuming that 
terrestrial life originated here as a single-biopolymer Darwinian system. Whether such life remains 
on Earth depends on whether it was able to find a niche on the planet where it could compete with 
its descendants that developed two biopolymers. The superior power of proteins as catalysts 
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provides presumptive arguments that a life form that did not exploit proteins as catalysts could not 
have competed with life that did. The biochemical innovation associated with translation almost 
certainly prompted an extinction more massive than the well known extinctions at the end of the 
Cretaceous period. 

A variety of ecological niches might provide single-biopolymer systems with an adaptive 
advantage over two-biopolymer systems, however, and may have provided ribo-organisms with the 
opportunity to survive on Earth even in the presence of two-biopolymer systems. For example, 
because cells containing single-biopolymer life can be much smaller than two-biopolymer cells, 
one-biopolymer life might have survived where small size offers a selective advantage. In 
subterranean matrices, for example, geological formations can have pore sizes that are too small to 
permit a two-biopolymer organism to live, but might permit a single-biopolymer cell to reside free 
from competition from its more adept protein-using cousins. 

CONCLUSIONS 

Experimental results suggest that the RNA World had both the opportunity and the motivation to 
use an expanded genetic alphabet. It remains to be seen how effectively functionalized 
oligonucleotides make a compromise between the structural demands for catalysis and the physical 
properties required for effective Darwinian evolution. Should experimental work show that they do 
so, we expect in vitro selections to provide effective new catalysts with the expanded genetic 
alphabet. In the most optimistic scenario, analogous single-biopolymer forms of life may be found 
elsewhere in the solar system, and perhaps in enclaves on planet Earth. 
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FIGURE LEGENDS 


Figure 1. Transfer RNA contains a rich collection of functionalized standard nucleobases, created 
by post-transcriptional modification, that deliver functional groups (amino groups, carboxylic acid 
groups, aliphatic hydrophobic groups, in green) not found within unmodified RNA. Could these be 
vestiges of functionalized RNA originating in the RNA World? 

Figure 2. 12 bases that are possible in a DNA- or RNA-based "alphabet" within the constraints of 
the Watson-Crick base pair geometry. Pyrimidine base analogs are designated by "py", purine by 
"pu". The upper case letters following the designation indicate the pattern ofhydrogen bonding 
acceptor (A) (in blue) and donor (D) (in red) groups. Thus, cytosine is pyDAA, guanine is puADD, 
adenine is puDA- (diaminopurine, puDAD, completes the Watson-Crick base pair, and thymine is 
pyADA. The remainder of the base pairs are joined by non-standard hydrogen bonding schemes. 

Figure 3. Non-standard and standard nucleobases with functionality (in green). Note that the 
pyDAD nucleobase can be protonated below pH 7 (pKa = 7.4) 

Figure 4. Functionalized standard bases that have been used in in vitro selections. Functional groups 
are shown in green, with the hydrogen bonding acceptor and donor in blue and red, respectively. 
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